95 research outputs found

    Recognizing point clouds using conditional random fields

    Get PDF
    Detecting objects in cluttered scenes is a necessary step for many robotic tasks and facilitates the interaction of the robot with its environment. Because of the availability of efficient 3D sensing devices as the Kinect, methods for the recognition of objects in 3D point clouds have gained importance during the last years. In this paper, we propose a new supervised learning approach for the recognition of objects from 3D point clouds using Conditional Random Fields, a type of discriminative, undirected probabilistic graphical model. The various features and contextual relations of the objects are described by the potential functions in the graph. Our method allows for learning and inference from unorganized point clouds of arbitrary sizes and shows significant benefit in terms of computational speed during prediction when compared to a state-of-the-art approach based on constrained optimization.Peer ReviewedPostprint (author’s final draft

    Action recognition based on efficient deep feature learning in the spatio-temporal domain

    Get PDF
    © 20xx IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other uses, in any current or future media, including reprinting/republishing this material for advertising or promotional purposes, creating new collective works, for resale or redistribution to servers or lists, or reuse of any copyrighted component of this work in other works.Hand-crafted feature functions are usually designed based on the domain knowledge of a presumably controlled environment and often fail to generalize, as the statistics of real-world data cannot always be modeled correctly. Data-driven feature learning methods, on the other hand, have emerged as an alternative that often generalize better in uncontrolled environments. We present a simple, yet robust, 2D convolutional neural network extended to a concatenated 3D network that learns to extract features from the spatio-temporal domain of raw video data. The resulting network model is used for content-based recognition of videos. Relying on a 2D convolutional neural network allows us to exploit a pretrained network as a descriptor that yielded the best results on the largest and challenging ILSVRC-2014 dataset. Experimental results on commonly used benchmarking video datasets demonstrate that our results are state-of-the-art in terms of accuracy and computational time without requiring any preprocessing (e.g., optic flow) or a priori knowledge on data capture (e.g., camera motion estimation), which makes it more general and flexible than other approaches. Our implementation is made available.Peer ReviewedPostprint (author's final draft

    A local algorithm for the computation of image velocity via constructive interference of global Fourier components

    Get PDF
    A novel Fourier-based technique for local motion detection from image sequences is proposed. In this method, the instantaneous velocities of local image points are inferred directly from the global 3D Fourier components of the image sequence. This is done by selecting those velocities for which the superposition of the corresponding Fourier gratings leads to constructive interference at the image point. Hence, image velocities can be assigned locally even though position is computed from the phases and amplitudes of global Fourier components (spanning the whole image sequence) that have been filtered based on the motion-constraint equation, reducing certain aperture effects typically arising from windowing in other methods. Regularization is introduced for sequences having smooth flow fields. Aperture effects and their effect on optic-flow regularization are investigated in this context. The algorithm is tested on both synthetic and real image sequences and the results are compared to those of other local methods. Finally, we show that other motion features, i.e. motion direction, can be computed using the same algorithmic framework without requiring an intermediate representation of local velocity, which is an important characteristic of the proposed method.Postprint (author’s final draft

    Consistent depth video segmentation using adaptive surface models

    Get PDF
    We propose a new approach for the segmentation of 3-D point clouds into geometric surfaces using adaptive surface models. Starting from an initial configuration, the algorithm converges to a stable segmentation through a new iterative split-And-merge procedure, which includes an adaptive mechanism for the creation and removal of segments. This allows the segmentation to adjust to changing input data along the movie, leading to stable, temporally coherent, and traceable segments. We tested the method on a large variety of data acquired with different range imaging devices, including a structured-light sensor and a time-of-flight camera, and successfully segmented the videos into surface segments. We further demonstrated the feasibility of the approach using quantitative evaluations based on ground-truth data.This research is partially funded by the EU project IntellAct (FP7-269959), the Grup consolidat 2009 SGR155, the project PAU+ (DPI2011-27510), and the CSIC project CINNOVA (201150E088). B. Dellen acknowledges support from the Spanish Ministry of Science and Innovation through a Ramon y Cajal program.Peer Reviewe

    Recognizing point clouds using conditional random fields

    Get PDF
    Trabajo presentado a la 22nd International Conference on Pattern Recognition (ICPR-2014), celebrada en Estocolmo (Suecia) del 24 al 28 de agosto.Detecting objects in cluttered scenes is a necessary step for many robotic tasks and facilitates the interaction of the robot with its environment. Because of the availability of efficient 3D sensing devices as the Kinect, methods for the recognition of objects in 3D point clouds have gained importance during the last years. In this paper, we propose a new supervised learning approach for the recognition of objects from 3D point clouds using Conditional Random Fields, a type of discriminative, undirected probabilistic graphical model. The various features and contextual relations of the objects are described by the potential functions in the graph. Our method allows for learning and inference from unorganized point clouds of arbitrary sizes and shows significant benefit in terms of computational speed during prediction when compared to a state-of-the-art approach based on constrained optimization.This work was supported by the EU project (IntellAct FP7-269959), the project PAU+ (DPI2011-27510), and the CSIC project CINNOVA (201150E088). B. Dellen was supported by the Spanish Ministry for Science and Innovation via a Ramon y Cajal fellowship.Peer Reviewe

    Volume measurement with a consumer depth camera based on structured infrared light

    Get PDF
    The measurement of object volumes is of large importance for many sectors in industry, including agriculture, transportation, production, and forestry. In this paper, we investigate the feasibility of using commercial depth-sensing devices based on structured light such as the Kinect camera for volume measurement of objects of medium size. Using a fixed set-up, depth data are acquired for different views of the object and merged. Volumes are carved using a volume-intersection approach, which is computationally simple, and, most importantly, model-free. The performance of the method is evaluated using ground-truth volumes of a benchmark data set of selected objects, and volume-measurement errors are reported for a set of household objects.Peer ReviewedPostprint (published version

    Joint segmentation and tracking of object surfaces in depth movies along human/robot manipulations

    Get PDF
    A novel framework for joint segmentation and tracking in depth videos of object surfaces is presented. Initially, the 3D colored point cloud obtained using the Kinect camera is used to segment the scene into surface patches, defined by quadratic functions. The computed segments together with their functional descriptions are then used to partition the depth image of the subsequent frame in a consistent manner with respect to the precedent frame. This way, solutions established in previous frames can be reused which improves the efficiency of the algorithm and the coherency of the segmentations along the movie. The algorithm is tested for scenes showing human and robot manipulations of objects. We demonstrate that the method can successfully segment and track the human/robot arm and object surfaces along the manipulations. The performance is evaluated quantitatively by measuring the temporal coherency of the segmentations and the segmentation covering using ground truth. The method provides a visual front-end designed for robotic applications, and can potentially be used in the context of manipulation recognition, visual servoing, and robot-grasping tasksPeer ReviewedPostprint (author’s final draft

    Segmenting color images into surface patches by exploiting sparse depth data

    Get PDF
    Trabajo presentado al WACV 2011 celebrado en Kona (USA) del 5 al 7 de enero.We present a new method for segmenting color images into their composite surfaces by combining color segmentation with model-based fitting utilizing sparse depth data, acquired using time-of-flight (Swissranger, PMD CamCube) and stereo techniques. The main target of our work is the segmentation of plant structures, i.e., leaves, from color-depth images, and the extraction of color and 3D shape information for automating manipulation tasks. Since segmentation is performed in the dense color space, even sparse, incomplete, or noisy depth information can be used. This kind of data often represents a major challenge for methods operating in the 3D data space directly. To achieve our goal, we construct a three-stage segmentation hierarchy by segmenting the color image with different resolutions - assuming that ``true'' surface boundaries must appear at some point along the segmentation hierarchy. 3D surfaces are then fitted to the color-segment areas using depth data. Those segments which minimize the fitting error are selected and used to construct a new segmentation. Then, an additional region merging and a growing stage are applied to avoid over-segmentation and label previously unclustered points. Experimental results demonstrate that the method is successful in segmenting a variety of domestic objects and plants into quadratic surfaces. At the end of the procedure, the sparse depth data is completed using the extracted surface models, resulting in dense depth maps. For stereo, the resulting disparity maps are compared with ground truth and the average error is computed.This research is partially funded by the EU GARNICS project FP7-247947, the Consolider-Ingenio project CSD2007-00018, and the Catalan Research Commission under 2009SGR155. G. AlenyĂ  and S. Foix were supported by CSIC under a Jae-Doc and Jae-Pre-Doc fellowship, respectively. B. Dellen acknowledges support from the Spanish Ministry for Science and Innovation via a Ramon y Cajal fellowship.Peer Reviewe

    Robotic leaf probing via segmentation of range data into surface patches

    Get PDF
    Presentado al International Conference on Intelligent Robots and Systems (IROS AGROBOTICS), Workshop on Agricultural Robotics: Enabling Safe, Efficient, Affordable Robots for Food Production celebrado en Portugal del 7 al 12 de octubre de 2012.We present a novel method for the robotized probing of plant leaves using Time-of-Flight (ToF) sensors. Plant images are segmented into surface patches by combining a segmentation of the infrared intensity image, provided by the ToF camera, with quadratic surface fitting using ToF depth data. Leaf models are fitted to the boundaries of the segments and used to determine probing points and to evaluate the suitability of leaves for being sampled. The robustness of the approach is evaluated by repeatedly placing an especially adapted, robot-mounted spad meter on the probing points which are extracted in an automatic manner. The number of successful chlorophyll measurements is counted, and the total time for processing the visual data and probing the plant with the robot is measured for each trial. In case of failure, the underlying causes are determined and reported, allowing a better assessment of the applicability of the method in real scenarios.This research is partially funded by the EU GARNICS project FP7-247947, by the Spanish Ministry of Science and Innovation under projects PAU+ and MIPRCV Consolider Ingenio CSD2007-00018, and the Catalan Research Commission. B. Dellen acknowledges support from the Spanish Ministry for Science and Innovation via a Ramon y Cajal program. S. Foix is supported by PhD fellowship from CSIC’s JAE program.Peer Reviewe

    Combining semantic and geometric features for object class segmentation of indoor scenes

    Get PDF
    Scene understanding is a necessary prerequisite for robots acting autonomously in complex environments. Low-cost RGB-D cameras such as Microsoft Kinect enabled new methods for analyzing indoor scenes and are now ubiquitously used in indoor robotics. We investigate strategies for efficient pixelwise object class labeling of indoor scenes that combine both pretrained semantic features transferred from a large color image dataset and geometric features, computed relative to the room structures, including a novel distance-from-wall feature, which encodes the proximity of scene points to a detected major wall of the room. We evaluate our approach on the popular NYU v2 dataset. Several deep learning models are tested, which are designed to exploit different characteristics of the data. This includes feature learning with two different pooling sizes. Our results indicate that combining semantic and geometric features yields significantly improved results for the task of object class segmentation.This research is partially funded by the CSIC project MANIPlus (201350E102), and the project RobInstruct (TIN2014-58178-R).Peer reviewe
    • …
    corecore